Test Drive of Discover/3000
by
Shawn M. Gordon
President SMGA
What it can do?
Discover/3000 is an extension of a Y2K tool that Impact Digital Solutions was selling, and it shows in the commands and focus of the product. That’s not to say it isn’t useful, but just to give you a quick background of its origin. At its heart, Discover is a sophisticated search tool. You can search through any type of file, including an image dataset, looking for various types of data. Discover is very big on allowing you to create custom pattern matchs for both inclusion and exclusion to facilitate your search.
For example, with a few simple commands, you can locate all of the records that contain references to search strings which are user defined. You can define up to 32 different “wildcard” character strings and up to 64 ‘excluded’ strings. You don’t want to see VALIDATE when you are looking for DATE. You can scan a variety of file types either alone or as a group. There is a limitation in Discover that limits it’s ability to scan a file that is up to 1024 bytes. Both summary and detail reports are produced based on the result of the search.
Discover automatically recognizes COBOL, JOB, and COPYLIB source files. The text matching capabilities can be used to analyze documents, source code, JOB streams, UDCs, command files, and pretty much any other text-based file. Discover can also locate specific text references such as names, cities, countries, depot codes, distribution centers, etc.
Discover can also scan IMAGE data sets to identify the data items that contain the dates or patterns which are user defineable. You can analyze an entire IMAGE database in one pass or just one set at a time. There are over 100 date formats and types are recognized, which is a heck of a lot more than I can think of. Empty dates and patterns, and specially coded dates such as all 9’s are also recognized. You are also to define what is empty or special for each type and format of date. And finally you can specify a percent certainty for a date or pattern to be considered valid. This kind of fuzzy logic is pretty slick
You can validate and extract data from MPE files and IMAGE data sets that either match or do not match a user defined pattern or date. Then you can convert and reformat dates in over 100 different formats and data types. Discover also has a PREVIEW mode that allows you to do a “dry run” on your file to check the results prior to an actual date conversion.
How does it work?
Discover is a command line application that works rather like Suprtool or Warehouse. Namely you set up all your pattern matches, parameters and search criteria, then execute it. You can also save the script so that it is executable again and again. There is a nifty little Reflection Basic program that will allow you to click and pick keywords from a pop-up window, but since I don’t have or use Reflection, I couldn’t test this feature.
There aren’t any magic tricks going on in Discover like MR NOBUF I/O or special sort routines, but because of the nature of the scripting engine it’s about as fast as a program written in a 3GL The Discover interpreter is also a process handling environment, so you can issue MPE commands inside the tool. Not only can you issue MPE commands, but they can be saved and used as part of a script, you just have to preface the command with a colon:
use extract.script
SCRIPT
HEAD
* SCRIPT to build a file of the names of all of the extract files from
* the STRING SEARCH commands. User is prompted for the type of source
****************** define SCRIPT items ******************
IT FNAME X26
IT STYPE X08
IT SOURCETYPE X08
:ECHO Script to build a file of the names of all of the the D3K
:ECHO EXTRACT files of a particular type
:PURGE XLIST >$null
:PURGE YLIST,TEMP >$null
* The input file must be permanent
:BUILD XLIST;REC=-36,,F,ASCII;DISC=10000
:FILE XLIST=XLIST,OLD
***************** get the desired source type ****************
:SETVAR SOURCETYPE “COBOL”
:INPUT SOURCETYPE;PROMPT=”Enter SOURCE type of extract files [COBOL] ”
:SETVAR SOURCETYPE LTRIM(UPS(“!SOURCETYPE “))
:SETVAR SPACEPOS 0
:SETVAR SPACEPOS POS(” “,”!SOURCETYPE”,1)
:IF SPACEPOS > 0
: SETVAR SOURCETYPE STR(“!SOURCETYPE”,1,SPACEPOS-1)
:ENDIF
:ECHO Looking for !SOURCETYPE extract files
Features
I was struck how the settings syntax is very similar to Quiz it appeared, could just be a coincidence, but here is a little snippet so you can judge for yourself.
DEFINE MACHINE HP927LX
DEFINE COMPANY
DEFINE SLIST SLIST.IDSINC
DEFINE DLIST DLIST.IDSINC
DEFINE PLIST PLIST.IDSINC
DEFINE FLIST FLIST.IDSINC
DEFINE EXTRACT EXTRACT.IDSINC
SET PROMPT YES
SET PREVIEW NO
DATES ASCII BINARY PACKED
SET DELIMITER /
SET ASCII SAME
SET PACKED SAME
MATCH FIRST
LOG
LOG EMPTY
LOG SPECIAL
ANLR1 19 62/99
CCYYRANGE 1962/1999
CNVR1 19 10/99
CNVR2 20 00/09
The pattern match is the heart, and strength of Discover, with this you can specify all sorts of fun things to look for as well as ranking for probability of a match.
To locate telephone numbers:
DEFINE PATTERN (###)###-#### DEFINE PATTERN (###) ###-####
DEFINE PATTERN ###-###-#### DEFINE PATTERN 1-###-###-####
DEFINE PATTERN ###.###.#### DEFINE PATTERN (###)###.####
To locate social security numbers or tax ids:
DEFINE PATTERN ###-##-####
DEFINE PATTERN ##-#######
You can almost think of these as COBOL edit masks, but in this case it will find data that conforms to the pattern that you have defined. A couple of other examples are the Pattern Match Files (PMF) for MPE files and Pattern Match Sets (PMS) for Image Data Sets.
PMF AT@.D@.P@ 75 P
This would search through the file set AT@.D@.P@ looking for a 75% certainty for Patterns that had been defined.
The “certainty” percentage is a little confusing, to quote from the manual:
The PATTERN MATCH FILES and PATTERN MATCH SETS commands require you to specify a percent certainty that represents the ratio of valid dates or patterns to all dates or patterns at a specific location in a data file or data set.
This percent allows DISCOVER/3000Ô to locate dates and patterns where some or even most of the data is empty or invalid. It is expressed as a percent.
A high value for certainty may not locate valid date fields if some of the dates are invalid, whereas a low value may result in ‘false’ locating of dates. The DATE ranges and the certainty must be defined to produce the ‘best’ results.
A good percent to start with is 80.
For DATES
CERTAINTY = (# valid dates) / (# non-enpty dates)
For PATTERNS
CERTAINTY = (# valid patterns) / (# non-enpty patterns)
While this seems like an interesting feature, I wasn’t entirely sure how to apply it to my tests.
One of the more powerful, and labor intensive, features of Discover is the DISCRIPT scripting language.
SCRIPT
HEADER
Header commands
BODY
Body commands
TRAILER
Trailer commands
ENDSCRIPT
The HEADER commands define the data items, the input file, and the output file. HEADER commands can also write records to the output file before processing the records in the input file. You could use this as a report writer type feature if you wanted. The maximum record size of the output file is 1024 bytes. This section is where you would put your file equates and the :BUILD or :COPY command to create the output file. The output file MUST exist prior to the first WRITE statement.
BODY commands are processed for each record in an input file. The records are written to the output file depending on the logic of the BODY commands. The command to write an output record is logically enough, WRITE. The output file record is created via NEWREC commands. Multiple output records or no output records may be produced for each input record. If an input file has not been specified prior to the first BODY command, you are prompted for an existing input file at execution time.
The TRAILER commands are processed after all records are read from the input file or when an EXIT command is encountered in the BODY section. The EXIT directive gives you an option for dumping out of a BODY loop based on some criteria that you can specify.
I don’t want to regurgitate the entire scripting manual, but there is support for variable declartions, conditional constricts (IF..ELSE..ENDIF) as well as string manipulation for changing the value of items and updating items. There are a number of predefined variables such as $DATE and $TIME.
Installation and Documentation
Installation is simple and straight forward. Restore the installation job and stream it, this will create the accounting structure and restore the files, and then very kindly clean itself up.
There are two manuals, one is the user guide which weighs in at 175 pages and the scripting guide is 45 pages. Both are well written and easy to use. The user guide is organized and has a table of contents and such, while the scripting guide is more of an alphabetized reference manual with some samples at the back.
There are a couple of nits that I would pick with Discover. The manual recommends that you set your screen display to 126 rows by 36 columns and gives directions for doing it in Reflection. They also include an RBA (Reflection Basic) script that allows you to pop in Discover keywords from a pick list. While this is a nice touch, it again leaves out the huge number of people using other emulators.
The Test Drive
The paradigm employed by Discover is a little different that what I’m use to, but it only took a short time to become familiar with it. I spent some time scanning through my source code and databases looking for particular records. Since one of my projects is an email system, I was able to do lots of date and pattern matching with this type of data.
Since I didn’t have any real world problems to solve I mostly just did some random research to test the various features of the product. Everything worked as advertised, and performance was just fine
Conclusions
I was a little skeptical of Discover at first as it was unclear what it was trying to be. After spending some time with it I came away with two basic impressions, first the bad: There are an awful lot of kind odd arbitrary limitations in the product like the number of search and exclusion parameters and record width of a file being read, and limitation on the length of commands in a job stream. It’s not that much harder to effectively remove those limitations.
That said, Discover is one of those tools that is probably a hard sell because you don’t really realize how useful it is until you get use to using it and take advantage of it to solve your problems. There is a significant amount of power and versatility in Discover, but it’s not something that everyone is going to need. Being able to do sophisticated text and data searches as well as data transformations is a pretty slick thing to have. I suggest you spend some time and analyze what kinds of work you do for trouble shooting, it might be that Discover will help you.
Road Report
Discover/3000 version 1.24
Impact Digital Solutions
130 Bradford St.
San Francisco, Ca. 94110
Phone (415) 642-8015
FAX (415) 282-1947
email: info@idswest.com
http://www.idswest.com
Discover/3000 includes all the software required to run on your HP e3000. If you have WRQ Reflection you can also use their RBS file for a command picklist. A diskette is included that contains MS Word copies of the manual.
Discover/3000 runs on all versions of MPE/iX. Pricing on the software for the HP e3000 is per copy, the first copy is $1,800 additional copies are $900. Support is $325 and $160 respectively price per year and includes phone in, electronic support and new releases of the software. All prices are in US dollars.